{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": [],
      "toc_visible": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "source": [
        "# Installation"
      ],
      "metadata": {
        "id": "d9c1psNr_bNB"
      }
    },
    {
      "cell_type": "code",
      "execution_count": 23,
      "metadata": {
        "collapsed": true,
        "id": "WtcYuV2Kj_GU",
        "outputId": "ec06d952-6ce0-4628-8b1f-daa46ea9e72f",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\u001b[?25l   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m0.0/292.4 kB\u001b[0m \u001b[31m?\u001b[0m eta \u001b[36m-:--:--\u001b[0m\r\u001b[2K   \u001b[91m━━━━━━━━━\u001b[0m\u001b[91m╸\u001b[0m\u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m71.7/292.4 kB\u001b[0m \u001b[31m2.7 MB/s\u001b[0m eta \u001b[36m0:00:01\u001b[0m\r\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m292.4/292.4 kB\u001b[0m \u001b[31m4.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h"
          ]
        }
      ],
      "source": [
        "# We will install the basic LangChain Google VertexAI package,\n",
        "# The LangChain Community package for access to loaders,\n",
        "# The LangChain Google Community package for speech-to-text API and\n",
        "# The text splitter package for working with long context\n",
        "! pip install -qU langchain langchain-google-vertexai langchain_community langchain_google_community[speech] langchain-text-splitters"
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "# Automatically restart kernel after installs so that your environment can access the new packages\n",
        "import IPython\n",
        "\n",
        "app = IPython.Application.instance()\n",
        "app.kernel.do_shutdown(True)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "_JQB5s15k8iY",
        "outputId": "bc1f652e-9079-4821-a113-862f9c4e5c5d"
      },
      "execution_count": 3,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "{'status': 'ok', 'restart': True}"
            ]
          },
          "metadata": {},
          "execution_count": 3
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "# Only needed if you work with Google Colab\n",
        "# You can skip this if you use a Vertex AI Workbench\n",
        "\n",
        "import sys\n",
        "\n",
        "if \"google.colab\" in sys.modules:\n",
        "    from google.colab import auth\n",
        "\n",
        "    auth.authenticate_user()"
      ],
      "metadata": {
        "id": "9eVLLqKwkceV"
      },
      "execution_count": 1,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# Set your project and region\n",
        "\n",
        "import vertexai\n",
        "\n",
        "PROJECT_ID = \"my-test-project\"  # @param {type:\"string\"}\n",
        "REGION = \"us-central1\"  # @param {type:\"string\"}\n",
        "\n",
        "# Initialize Vertex AI SDK\n",
        "vertexai.init(project=PROJECT_ID, location=REGION)"
      ],
      "metadata": {
        "id": "j_Xsv-lNkoKN"
      },
      "execution_count": 2,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Summarizing documents\n",
        "We will discuss two ways of summarizing long documents. The first way uses traditional map-reduce patterns to summarize documents across multiple LLM calls, while the second way makes use of the long context windows and multimodal capabilities of the latest LLMs to create summaries more efficiently. We will also quickly cover loaders to get documents into your chain, and then go deep into map-reduce patterns to give you a good understanding of what is going on under the hood."
      ],
      "metadata": {
        "id": "t-T8XOxVE1Qf"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Summarizing text\n",
        "We’ll start with summarizing text. Since most advanced modern LLMs have a long context window, we can  just put the whole text in the model’s context and wrap it with additional instructions  about what kind of summary we aim for. In practice, we use zero-shot learning for summarization tasks since examples are typically long. Let’s start with  Gemini release updates  ( (https://gemini.google.com/corp/updates) and ask Gemini to create a summary."
      ],
      "metadata": {
        "id": "AhcMvSaaCsWj"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from langchain.chains.summarize import load_summarize_chain\n",
        "from langchain_google_vertexai import ChatVertexAI\n",
        "from langchain_core.documents import Document\n",
        "\n",
        "google_gemini_release_updates = \"\"\"2024.10.09\n",
        "Introducing Imagen 3\n",
        "What: Create stunning images in Gemini with Imagen 3, our highest quality text-to-image model yet. Simply describe what you imagine, and watch as your ideas transform into visuals, bursting with vivid details and realism, in seconds.\n",
        "And starting in English, Gemini Advanced subscribers unlock the added capability of generating images featuring people, empowering even more variety in your creations.\n",
        "Why: Your imagination is limitless, and we want to place the power of creativity in your hands. We're also expanding image generation to more countries and languages, so even more people can bring their ideas to life with Gemini. While this model represents a leap forward in image generation quality and accuracy, we're constantly learning and will continue to improve to provide you the best experience possible.\n",
        "2024.10.06\n",
        "Gemini can access helpful information from more apps and services\n",
        "What: Over the coming days, Gemini will connect to more apps and services, letting you set reminders with Google Tasks, make and refine lists with Google Keep, and engage with playlists in YouTube Music. Learn more about extensions. Learn more about extensionsOpens in a new window.\n",
        "Why: We’re continuing to build new ways for Gemini to connect to the apps you rely on every day. Through these extensions, Gemini can coordinate across your apps and take actions for you, better than ever. Read moreOpens in a new window about how Gemini works with your favorite Assistant actions, including details on upcoming improvements.\n",
        "2024.10.02\n",
        "Gemini 1.5 Pro in Gemini Advanced just became more capable\n",
        "What: Gemini Advanced, with a chat optimized version of 1.5 Pro-002, now provides better and more accurate responses for prompts related to math and exploring complex topics that invite thoughtful conversation. Whether you’re providing detailed, multi-step instructions, or tackling advanced mathematical calculations, Gemini Advanced will help you navigate these challenges with greater ease.\n",
        "Why: We remain committed to rapid development and delivering the best of Gemini to our users. Through continuous model training and valuable user feedback, we’ve further enhanced the capabilities of Gemini 1.5 Pro, continuing to make Gemini Advanced better over time. Upgrade to Gemini Advanced\"\"\"\n",
        "llm = ChatVertexAI(temperature=0, model_name=\"gemini-1.5-pro\")\n",
        "chain = load_summarize_chain(llm, chain_type=\"stuff\")\n",
        "\n",
        "doc = Document(page_content=google_gemini_release_updates)\n",
        "\n",
        "result = chain.invoke([doc])\n",
        "print(result[\"output_text\"])"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "jsot3E3YCrY8",
        "outputId": "d64dc44f-6cf2-4acc-c2db-27918db88d7e"
      },
      "execution_count": 5,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Google is rapidly enhancing its AI assistant, Gemini. Recent updates include:\n",
            "\n",
            "* **Imagen 3 integration:** Generate high-quality images from text prompts within Gemini, with expanded capabilities for Gemini Advanced subscribers.\n",
            "* **App integration:** Connect Gemini to Google Tasks, Keep, and YouTube Music for improved functionality and task management.\n",
            "* **Gemini 1.5 Pro enhancements:**  Improved accuracy and capabilities for complex tasks, math problems, and nuanced conversations within Gemini Advanced. \n",
            "\n",
            "These updates highlight Google's focus on making Gemini a powerful and versatile AI tool for a wide range of applications. \n",
            "\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "LangChain is transparent about its inner workings. The following cell will let you inspect the template:"
      ],
      "metadata": {
        "id": "vcn0fRyHBUda"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "print(chain.document_prompt)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "4vbEZci3CvKL",
        "outputId": "9fefbd1a-6828-406b-885b-31b1b06ce765"
      },
      "execution_count": 6,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "input_variables=['page_content'] input_types={} partial_variables={} template='{page_content}'\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Sometimes you may need to combine documents into a single document. We use a chain for this. With `_get_inputs`, we can see what is going on inside the chain."
      ],
      "metadata": {
        "id": "QiQuMoGjBbyv"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "doc1 = Document(page_content=\"text1\")\n",
        "\n",
        "doc2 = Document(page_content=\"text2\")\n",
        "\n",
        "chain._get_inputs([doc1, doc2])"
      ],
      "metadata": {
        "id": "ZIESB36oCz4y",
        "outputId": "cdb62504-0fe1-4cf8-f666-f37af0f17a30",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "execution_count": 7,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "{'text': 'text1\\n\\ntext2'}"
            ]
          },
          "metadata": {},
          "execution_count": 7
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "And as always, we have full access to everything that is going on inside the chain, including its (deceptively simple) prompt."
      ],
      "metadata": {
        "id": "6vzMVoVRBt0j"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "print(chain.llm_chain.prompt)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "UktDTdAXBuFr",
        "outputId": "e8390740-e95e-4f8f-f2da-0e188fe34c70"
      },
      "execution_count": 8,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "input_variables=['text'] input_types={} partial_variables={} template='Write a concise summary of the following:\\n\\n\\n\"{text}\"\\n\\n\\nCONCISE SUMMARY:'\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "This also means that we can modify the behavior of the chain using our own prompt. Let's assume you want to significantly shorten the summary. You can add a `max_length` variable and pass that to the prompt."
      ],
      "metadata": {
        "id": "5Ji4cHYqCGoP"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from langchain_core.prompts.chat import ChatPromptTemplate\n",
        "\n",
        "prompt = ChatPromptTemplate.from_messages(\n",
        "   [(\"system\", \"Create an extractive summary in simple English with no more than {max_length} words\"),\n",
        "  (\"human\", \"Use the following context for the summary: \\n{context}\")]\n",
        ")\n",
        "\n",
        "chain = load_summarize_chain(\n",
        " llm, chain_type=\"stuff\", document_separator=\"\\n\",\n",
        " prompt=prompt, document_variable_name=\"context\")\n",
        "\n",
        "result = chain.invoke(\n",
        " input={\"input_documents\": [doc], \"max_length\": 20})\n",
        "print(result[\"output_text\"])"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "t5TnC0tNCGxJ",
        "outputId": "ed7e8334-84f4-40c6-b5ce-c248a768f168"
      },
      "execution_count": 9,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Gemini gets better image creation, connects to more apps, and improves its advanced version. \n",
            "\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "w we have control over the length of the output. Note that to provide additional arguments to the chain, we need to provide a full dictionary to the `input` key argument during invocation instead of providing a list of documents:"
      ],
      "metadata": {
        "id": "JSPEULrTCy7k"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "result = chain.invoke(\n",
        "  input={\"input_documents\": [doc], \"max_length\": 50})\n",
        "\n",
        "print(result[\"output_text\"])"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "jIJn7MfXCzKe",
        "outputId": "a3f057a7-fd03-4806-ad3d-920e60c90098"
      },
      "execution_count": 10,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Google's Gemini is getting major updates. Gemini can now create realistic images from text with Imagen 3 and connect to more apps like Google Tasks and YouTube Music. Gemini Advanced users get an even smarter chatbot experience with improved math and reasoning skills. \n",
            "\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Using Loaders\n",
        "Let's look at a practical example: You want to learn about all the amazing new concepts in the world of Generative AI, but you simply don't have the time to read through the many papers published each day. How about building a LangChain agent that can help you quickly summarize complex papers in simple English? We can use `WebBaseLoader` to load the HTML version of any arxiv paper (such as the example on multi-agent systems below) and then pass it to the model:"
      ],
      "metadata": {
        "id": "_jT-n4O9C6zb"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from langchain_community.document_loaders import WebBaseLoader\n",
        "\n",
        "loader = WebBaseLoader(\"https://arxiv.org/html/2312.10256v1\")\n",
        "# Note how the loader already returns LangChain Document types\n",
        "docs = loader.load()\n",
        "\n",
        "# We use the standard load_summarize_chain for summarizing the text.\n",
        "chain = load_summarize_chain(\n",
        " llm, chain_type=\"stuff\", document_separator=\"\\n\",\n",
        " prompt=prompt, document_variable_name=\"context\")\n",
        "\n",
        "result = chain.invoke(\n",
        "  input={\"input_documents\": docs, \"max_length\": 50})\n",
        "\n",
        "print(result[\"output_text\"])"
      ],
      "metadata": {
        "id": "S6g2NYU1DErN"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Map-reduce pattern\n",
        "What should we do if our document doesn’t fit into the model’s input window? We have a classic answer - Map-Reduce! This is a Big Data processing technology that has been with us for around 15 years. The idea is to split the original task into smaller ones that can be processed independently (Map phase) and then combine task outputs together (Reduce phase). Let's see a practical example for how this works."
      ],
      "metadata": {
        "id": "zDJBxXU4O1DK"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# First, we need to load our long pdf and split it into smaller chunks:\n",
        "from langchain_text_splitters import RecursiveCharacterTextSplitter\n",
        "\n",
        "loader = WebBaseLoader(\"https://arxiv.org/html/2312.10256v1\")\n",
        "doc = loader.load()\n",
        "\n",
        "# RecursiveCharacterTextSplitter can be configured to any chunk size.\n",
        "# Make sure the ensuing text fits into your models\n",
        "text_splitter = RecursiveCharacterTextSplitter(\n",
        "\n",
        "   chunk_size=2000, chunk_overlap=100,\n",
        "\n",
        "   length_function=len, is_separator_regex=False)\n",
        "\n",
        "docs = text_splitter.split_documents(doc)\n",
        "\n",
        "print(f\"I have created {len(docs)} documents\")\n",
        "\n",
        "# Next, we want to create the summary:\n",
        "from langchain_core.prompts import PromptTemplate\n",
        "\n",
        "prompt = PromptTemplate.from_template(\n",
        "   \"Collapse this content: {text}\"\n",
        ")\n",
        "\n",
        "# We create a load_summarize chain following the map_reduce pattern\n",
        "chain_mr = load_summarize_chain(llm, chain_type=\"map_reduce\", collapse_prompt=prompt)\n",
        "\n",
        "result = chain_mr.invoke(docs)\n",
        "print(\"Here is your summary: \" + result[\"output_text\"])"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "OuVLkXuUmFuv",
        "outputId": "b9277b18-9e9c-4b7b-90dc-7b15981b5933"
      },
      "execution_count": 14,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "I have created 104 documents\n",
            "Here is your summary: Multi-agent reinforcement learning (MARL) is a rapidly evolving field that combines game theory and reinforcement learning to enable multiple agents to learn and interact effectively in shared environments. While facing challenges like coordination, communication, and non-stationarity, MARL leverages techniques like deep learning, value function factorization, and communication learning to develop intelligent agents. These agents are being applied to diverse domains, including robotics, game playing, and autonomous systems, showcasing MARL's potential to revolutionize various industries. \n",
            "\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Of course you are able to inspect the prompt in the Map-Reduce Chain."
      ],
      "metadata": {
        "id": "vLrkksIaF3Tw"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "print(chain_mr.llm_chain.prompt)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "NizEitrZF88N",
        "outputId": "4cface52-df09-42a1-90c6-dcdbf0162c9b"
      },
      "execution_count": 16,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "input_variables=['text'] input_types={} partial_variables={} template='Write a concise summary of the following:\\n\\n\\n\"{text}\"\\n\\n\\nCONCISE SUMMARY:'\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Summarizing other modalities"
      ],
      "metadata": {
        "id": "NgZjINMuOwO1"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "Let’s look at how we would process the arxiv paper we analysed in the previous section using multimodal capabilities: This time, we will get a .pdf file from a public Google Cloud Storage bucket and construct a HumanMessage that has two (or more) pieces of content. You can find the full code in our Github repository."
      ],
      "metadata": {
        "id": "SCsy_gFfOsBp"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from langchain_core.messages import HumanMessage, SystemMessage\n",
        "\n",
        "pdf_message = {\n",
        "   \"type\": \"image_url\",\n",
        "   \"image_url\": {\n",
        "       \"url\": \"gs://arxiv-dataset/arxiv/arxiv/pdf/2312/2312.10256v1.pdf\"\n",
        "   },\n",
        "}\n",
        "\n",
        "summarize_message = {\n",
        "   \"type\": \"text\",\n",
        "   \"text\": \"Create an extractive summary in simple English with no more than 50 words of the following documents.\",\n",
        "}\n",
        "\n",
        "message = HumanMessage(content=[pdf_message, summarize_message])\n",
        "result = llm.invoke([message])\n",
        "print(result.content)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "6UrqLSuyM56r",
        "outputId": "54e9d429-fe31-4a19-bf61-679e04bc8989"
      },
      "execution_count": 18,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Multi-agent reinforcement learning (MARL) is a type of artificial intelligence where multiple agents learn to make decisions in a shared environment. MARL is challenging because it involves coordinating the actions of multiple agents, which can lead to complex and unpredictable behaviors. Despite the challenges, MARL has the potential to solve complex problems that cannot be solved by single-agent systems. \n",
            "\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "reduce_prompt = ChatPromptTemplate.from_messages(\n",
        "    [(\"system\", \"Create a summary in less than 200 words in simple English of these key points extract from a document:\\n{context}\")]\n",
        ")\n",
        "reduce_chain = reduce_prompt | llm | StrOutputParser()\n",
        "\n",
        "reduce_chain.invoke({\"context\":summary})"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 126
        },
        "id": "LvB1DC9Vtoaj",
        "outputId": "7e005c18-bfad-49b6-ca30-bee958393edd"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "'This document is a comprehensive survey of multi-agent reinforcement learning (MARL), a field that focuses on training multiple agents to interact and learn in complex environments. MARL offers benefits like enabling complex tasks and understanding social dynamics, but faces challenges like computational complexity, non-stationarity, coordination, and performance evaluation. The survey covers various aspects of MARL, including game theory, deep learning, reinforcement learning, communication, team-play, knowledge transfer, and agent modeling. It explores different training paradigms, communication strategies, and open research directions. The document also discusses the challenges of designing reward functions, handling imperfect information, and evaluating performance in multi-agent systems. \\n'"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "string"
            }
          },
          "metadata": {},
          "execution_count": 61
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Let’s look at another example for using a loader for processing multimodal information: [Google Cloud Speech-To-Text](https://cloud.google.com/speech-to-text/docs/optimizing-audio-files-for-speech-to-text#optimize_video_files_for_analysis ) services help us to transcribe audio or video into text that can be processed by any LLM. For the example below, assume you want to transcribe and process a recording of a presentation given by a colleague.  "
      ],
      "metadata": {
        "id": "ue2qEiZhPA6k"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from langchain_google_community import SpeechToTextLoader\n",
        "\n",
        "# First we load a speech file from the public domain\n",
        "! wget -O audio.flac https://upload.wikimedia.org/wikipedia/commons/a/ae/Rep._Vance_McAllister_on_Increasing_the_Mississippi_River_MRT.flac\n",
        "\n",
        "# Then we create a speech loader for this file\n",
        "speech_loader = SpeechToTextLoader(\n",
        "  project_id=PROJECT_ID, file_path=\"\"\"audio.flac\"\"\", is_long=False)\n",
        "\n",
        "audio_recording_docs = speech_loader.load()\n",
        "# Let's print out how long it is\n",
        "print(len(docs[0].page_content))"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "zmqMwfahPWBL",
        "outputId": "eef68131-6eee-44c8-ec64-73d2ffe4b39c"
      },
      "execution_count": 60,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "--2024-10-12 20:33:17--  https://upload.wikimedia.org/wikipedia/commons/a/ae/Rep._Vance_McAllister_on_Increasing_the_Mississippi_River_MRT.flac\n",
            "Resolving upload.wikimedia.org (upload.wikimedia.org)... 208.80.153.240, 2620:0:860:ed1a::2:b\n",
            "Connecting to upload.wikimedia.org (upload.wikimedia.org)|208.80.153.240|:443... connected.\n",
            "HTTP request sent, awaiting response... 200 OK\n",
            "Length: 6733047 (6.4M) [audio/x-flac]\n",
            "Saving to: ‘audio.flac’\n",
            "\n",
            "audio.flac          100%[===================>]   6.42M  15.1MB/s    in 0.4s    \n",
            "\n",
            "2024-10-12 20:33:18 (15.1 MB/s) - ‘audio.flac’ saved [6733047/6733047]\n",
            "\n",
            "1974\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Now we can work with these documents as usual."
      ],
      "metadata": {
        "id": "vbbIKI46aaiV"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "chain = load_summarize_chain(\n",
        " llm, chain_type=\"stuff\", document_separator=\"\\n\",\n",
        " prompt=prompt)\n",
        "\n",
        "result = chain.invoke(\n",
        "  input={\"input_documents\": audio_recording_docs, \"max_length\": 50})\n",
        "print(result[\"output_text\"])"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "gSg5nW5BaZpX",
        "outputId": "c7e9ff76-5055-47f1-8e7a-fc08f5a27656"
      },
      "execution_count": 59,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Despite the ongoing Padam Canal expansion project, continuous investment in data resources is crucial. These waterways are vital to our nation, and their significance outweighs any single finding. The committee deserves commendation for their diligent work on this complex challenge. \n",
            "\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Another option is to pass the whole audio (or video) content directly to the model. In this case, we need to construct a special type of media message pointing to the file and mentioning the mime type of the content."
      ],
      "metadata": {
        "id": "A1SMXKAsbsuN"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "import base64\n",
        "\n",
        "# We first need to encode our audio to base64 so we can pass it to the LLM\n",
        "with open(\"audio.flac\",\"rb\") as f:\n",
        "  audio = base64.b64encode(f.read()).decode('utf-8')\n",
        "\n",
        "# Then just construct a media message, making sure to give the correct mime_type\n",
        "media_message = {\n",
        "       \"type\": \"media\",\n",
        "       \"data\": audio,\n",
        "       \"mime_type\": \"audio/flac\",\n",
        "   }\n",
        "\n",
        "message = HumanMessage(content=[summarize_message, media_message])\n",
        "\n",
        "output = llm([message])\n",
        "\n",
        "print(output.content)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "HR5DzcExbs75",
        "outputId": "8632c79e-a8c0-483a-87c3-e3da2f4d5335"
      },
      "execution_count": 77,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "The Mississippi River and its tributaries are vital for commerce and are currently experiencing flooding. The amendment proposes increasing funding for the Mississippi River and Tributaries project by $47 million to restore it to FY14 levels. This investment is crucial, especially with the Panama Canal expansion underway, and reducing funding would be detrimental to the nation. \n",
            "\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Answering questions on long documents\n",
        "Summarization is just the first step in achieving the goal of our users: Answering questions on long documents! No one has time to read these long documents, and while summaries are helpful, what users really want is a clear, concise answer to a question. In this section, we will discuss how you can use the long context window of modern LLM to efficiently answer questions on long documents.\n",
        "\n",
        "With the introduction of Gemini 1.5 Pro, Google was able to significantly expand the context available for processing in one message to the LLM. Imagine that you can perform Q&A on a 132 pages long document with a single call! To easily enable you to work with long context, LangChain introduced new types of messages. Don’t be confused by the types: Since long context was born out of multimodal model capabilities (processing images and text in the same invocation), the message type for pdf messages is “image_url”.  "
      ],
      "metadata": {
        "id": "1O05UcUBoCe2"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# Don't be confused by image_url when loading pdf. This started as an image loader\n",
        "pdf_message = {\n",
        "   \"type\": \"image_url\",\n",
        "   \"image_url\": {\"url\": \"gs://arxiv-dataset/arxiv/arxiv/pdf/2312/2312.10256v1.pdf\"},\n",
        "}\n",
        "\n",
        "text_message = {\n",
        "   \"type\": \"text\",\n",
        "   \"text\": \"What is the biggest challenge with MARL? Respond in 10 words or fewer.\",\n",
        "}\n",
        "\n",
        "# We simply add the pdf message to the model\n",
        "message = HumanMessage(content=[pdf_message, text_message])\n",
        "\n",
        "output = llm.invoke([message])\n",
        "print(output.content)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "i8iVnUHIoCAk",
        "outputId": "ed0d50fa-9cd7-429c-8197-4289f3f5499e"
      },
      "execution_count": 79,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Non-stationarity from other agents' adapting behaviors. \n",
            "\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "You can also provide multiple PDFs and ask more advanced questions:"
      ],
      "metadata": {
        "id": "hnUb8pJgpS9p"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "pdf_message_1 = {\n",
        "   \"type\": \"image_url\",\n",
        "   \"image_url\": {\"url\": \"gs://arxiv-dataset/arxiv/arxiv/pdf/2312/2312.10256v1.pdf\"},\n",
        "}\n",
        "pdf_message_2 = {\n",
        "   \"type\": \"image_url\",\n",
        "   \"image_url\": {\"url\": \"gs://arxiv-dataset/arxiv/arxiv/pdf/2410/2410.01706v1.pdf\"},\n",
        "}\n",
        "\n",
        "text_message = {\n",
        "   \"type\": \"text\",\n",
        "   \"text\": \"What are the key differences between MARL and Sable? Respond in 50 words or fewer.\",\n",
        "}\n",
        "\n",
        "# We simply add the pdf message to the model\n",
        "message = HumanMessage(content=[pdf_message_1, pdf_message_2, text_message])\n",
        "\n",
        "output = llm.invoke([message])\n",
        "print(output.content)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "7HHVG0dspWEY",
        "outputId": "c2c52c5d-0ae8-489b-d6aa-a84fbdf2e213"
      },
      "execution_count": 82,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "MARL (Multi-Agent Reinforcement Learning) involves multiple agents learning in a shared environment, while Sable is a specific MARL algorithm that uses a memory-efficient sequence modeling approach with retention instead of attention. \n",
            "\n"
          ]
        }
      ]
    }
  ]
}
